Optimization of stencil-based fusion kernels on Tera-flops many-core architectures

نویسنده

  • Y. ASAHI
چکیده

We present the optimization of kernels from fusion plasma codes, GYSELA and GT5D, on Tera-flops many-core architectures including accelerators (Xeon Phi, TeslaK20X), and CPUs (FX100). Through the optimization, we found that the structure of array (SoA) style implementation is effective for SIMD operations on all architectures, and high cache locality, which is achieved in GYSELA, is of critical importance on accelerators. Also, the OpenMP dynamic scheduling is effective to overcome the drawbacks of in-order instruction processors such as Xeon Phi.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling the Performance of Geometric Multigrid on Many-core Computer Architectures

The basic building blocks of the classic geometric multigrid algorithm are all essentially stencil computations and have a low ratio of executed floating point operations per byte fetched from memory. On modern computer architectures, such computational kernels are typically bounded by memory traffic and achieve only a small percentage of the theoretical peak floating point performance of the u...

متن کامل

A Generalized Framework for Auto-tuning Stencil Computations

This work introduces a generalized framework for automatically tuning stencil computations to achieve superior performance on a broad range of multicore architectures. Stencil (nearest-neighbor) based kernels constitute the core of many important scientific applications involving block-structured grids. Auto-tuning systems search over optimization strategies to find the combination of tunable p...

متن کامل

Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, which greatly limits their performance and power efficiency. In this paper, we accelerate the forward-modeling technique on the latest multi-co...

متن کامل

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs

Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly lessen the pressure of accessing slow global memory. Stencil computations, for example, can exploit such data reuse via spatial blocking through t...

متن کامل

Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures

This paper presents a system for automatically supporting the optimization of stencil kernels on emerging Non-Uniform Memory Access(NUMA) many-core architectures, through a combined compiler + runtime approach. In particular, we use a pragma-driven compiler to recognize the special structures and optimization needs of stencil computations and thereby to automatically generate low-level code tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015